AITopics | model-free reinforcement learning

Collaborating Authors

model-free reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

Neural Information Processing SystemsDec-24-2025, 20:33:56 GMT

We consider the problem of interactive decision making, encompassing structured bandits and reinforcementlearning with general function approximation. Recently, Foster et al. (2021) introduced theDecision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decisionmaking, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upperbounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which liftsalgorithms for (supervised) online estimation into algorithms fordecision making. In this paper, we show that by combining Estimation-to-Decisions witha specialized form of optimistic estimation introduced byZhang (2022), it is possible to obtain guaranteesthat improve upon those of Foster et al. (2021) byaccommodating more lenient notions of estimation error. We use this approach to derive regret bounds formodel-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.

decision-estimation coefficient, model-free reinforcement learning, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Model-Free Reinforcement Learning with the Decision-Estimation Coefficient

Neural Information Processing SystemsOct-11-2024, 14:01:54 GMT

We consider the problem of interactive decision making, encompassing structured bandits and reinforcementlearning with general function approximation. Recently, Foster et al. (2021) introduced theDecision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decisionmaking, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upperbounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which liftsalgorithms for (supervised) online estimation into algorithms fordecision making. In this paper, we show that by combining Estimation-to-Decisions witha specialized form of "optimistic" estimation introduced byZhang (2022), it is possible to obtain guaranteesthat improve upon those of Foster et al. (2021) byaccommodating more lenient notions of estimation error. We use this approach to derive regret bounds formodel-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.

decision-estimation coefficient, estimation-to-decision, model-free reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Integrating Model-Based Footstep Planning with Model-Free Reinforcement Learning for Dynamic Legged Locomotion

Lee, Ho Jae, Hong, Seungwoo, Kim, Sangbae

arXiv.org Artificial IntelligenceAug-5-2024

In this work, we introduce a control framework that combines model-based footstep planning with Reinforcement Learning (RL), leveraging desired footstep patterns derived from the Linear Inverted Pendulum (LIP) dynamics. Utilizing the LIP model, our method forward predicts robot states and determines the desired foot placement given the velocity commands. We then train an RL policy to track the foot placements without following the full reference motions derived from the LIP model. This partial guidance from the physics model allows the RL policy to integrate the predictive capabilities of the physics-informed dynamics and the adaptability characteristics of the RL controller without overfitting the policy to the template model. Our approach is validated on the MIT Humanoid, demonstrating that our policy can achieve stable yet dynamic locomotion for walking and turning. We further validate the adaptability and generalizability of our policy by extending the locomotion task to unseen, uneven terrain. During the hardware deployment, we have achieved forward walking speeds of up to 1.5 m/s on a treadmill and have successfully performed dynamic locomotion maneuvers such as 90-degree and 180-degree turns.

robot, step location, terrain, (14 more...)

arXiv.org Artificial Intelligence

2408.02662

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

Cai, Tianchi, Bao, Shenliao, Jiang, Jiyan, Zhou, Shiji, Zhang, Wenpeng, Gu, Lihong, Gu, Jinjie, Zhang, Guannan

arXiv.org Artificial IntelligenceAug-25-2023

Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL scenarios with deterministic rewards, which makes RL-based recommender systems much more challenging. In this paper, we first demonstrate in a simulator environment where using direct stochastic feedback results in a significant drop in performance. Then to handle the stochastic feedback more efficiently, we design two stochastic reward stabilization frameworks that replace the direct stochastic feedback with that learned by a supervised model. Both frameworks are model-agnostic, i.e., they can effectively utilize various supervised models. We demonstrate the superiority of the proposed frameworks over different RL-based recommendation baselines with extensive experiments on a recommendation simulator as well as an industrial-level recommender system.

machine learning, recommender system, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3592022

2308.13246

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Omega-Regular Reward Machines

Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven, Somenzi, Fabio, Trivedi, Ashutosh, Wojtczak, Dominik

arXiv.org Artificial IntelligenceAug-14-2023

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2308.07469

Country:

North America > United States > Colorado > Boulder County > Boulder (0.15)
Europe > United Kingdom > England > Merseyside > Liverpool (0.14)
Europe > Netherlands (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Beck, Edgar, Bockelmann, Carsten, Dekorsy, Armin

arXiv.org Artificial IntelligenceMay-5-2023

Motivated by the recent success of Machine Learning tools in wireless communications, the idea of semantic communication by Weaver from 1949 has gained attention. It breaks with Shannon's classic design paradigm by aiming to transmit the meaning, i.e., semantics, of a message instead of its exact version, allowing for information rate savings. In this work, we apply the Stochastic Policy Gradient (SPG) to design a semantic communication system by reinforcement learning, not requiring a known or differentiable channel model - a crucial step towards deployment in practice. Further, we motivate the use of SPG for both classic and semantic communication from the maximization of the mutual information between received and target variables. Numerical results show that our approach achieves comparable performance to a model-aware approach based on the reparametrization trick, albeit with a decreased convergence rate.

communication, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2305.03571

Country:

Europe > Germany > Bremen > Bremen (0.28)
North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Model-Free Reinforcement Learning for Asset Allocation

#artificialintelligenceSep-22-2022, 00:22:22 GMT

Asset allocation (or portfolio management) is the task of determining how to optimally allocate funds of a finite budget into a range of financial instruments/assets such as stocks. This study investigated the performance of reinforcement learning (RL) when applied to portfolio management using model-free deep RL agents. We trained several RL agents on real-world stock prices to learn how to perform asset allocation. We compared the performance of these RL agents against some baseline agents. We also compared the RL agents among themselves to understand which classes of agents performed better. From our analysis, RL agents can perform the task of portfolio management since they significantly outperformed two of the baseline agents (random allocation and uniform allocation). Four RL agents (A2C, SAC, PPO, and TRPO) outperformed the best baseline, MPT, overall. This shows the abilities of RL agents to uncover more profitable trading strategies. Furthermore, there were no significant performance differences between value-based and policy-based RL agents. Actor-critic agents performed better than other types of agents. Also, on-policy agents performed better than off-policy agents because they are better at policy evaluation and sample efficiency is not a significant problem in portfolio management. This study shows that RL agents can substantially improve asset allocation since they outperform strong baselines. On-policy, actor-critic RL agents showed the most promise based on our analysis.

asset allocation, model-free reinforcement learning

#artificialintelligence

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Model-Free Reinforcement Learning for Optimal Control of MarkovDecision Processes Under Signal Temporal Logic Specifications

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

arXiv.org Artificial IntelligenceSep-27-2021

We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process while guaranteeing a desired lower bound on the probability of satisfying a signal temporal logic (STL) specification. We propose a method to effectively augment the MDP state space to capture the required state history and express the STL objective as a reachability objective. The planning problem can then be formulated as a finite-horizon constrained Markov decision process (CMDP). For a general finite horizon CMDP problem with unknown transition probability, we develop a reinforcement learning scheme that can leverage any model-free RL algorithm to provide an approximately optimal policy out of the general space of non-stationary randomized policies. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.

algorithm, optimal policy, probability, (15 more...)

arXiv.org Artificial Intelligence

2109.13377

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Republic of Türkiye > Aksaray Province > Aksaray (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration

Bogdanovic, Miroslav, Khadiv, Majid, Righetti, Ludovic

arXiv.org Artificial IntelligenceJul-14-2021

Hence, losing the time dependence However, exploration remains a serious challenge in from the demonstration trajectories in the final feedback RL, especially for legged locomotion control, mainly due policy is the key in our approach to provide robustness with to the sparse rewards in problems with contact as well as respect to contact timing uncertainties. the inherent under-actuation and instability of legged robots. Furthermore, to successfully transfer learned control policies A. Related work to real robots, there is still no consensus among researchers Demonstrations have long been used in dealing with about the choice of the action space [4] and what (and how) exploration issues in reinforcement learning for robotic tasks to randomize [5] in the training procedure to generate robust [11], [12], [13]. Recently, demonstrations have been used as policies.

demonstration, robot, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2107.06629

Country:

North America > United States > New York (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.69)

Add feedback

Efficient Model-free Reinforcement Learning in Metric Spaces

Song, Zhao, Sun, Wen

arXiv.org Machine LearningMay-1-2019

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15]. Recently, equipped with the idea of optimism in the face of uncertainty, Q-learning algorithms [Jin, Allen-Zhu, Bubeck, Jordan 18] can be proven to be sample efficient for discrete tabular Markov Decision Processes (MDPs) which have finite number of states and actions. In this work, we present an efficient model-free Q-learning based algorithm in MDPs with a natural metric on the state-action space--hence extending efficient model-free Q-learning algorithms to continuous state-action space. Compared to previous model-based RL algorithms for metric spaces [Kakade, Kearns, Langford 03], our algorithm does not require access to a black-box planning oracle.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1905.00475

Country: Asia > Middle East > Jordan (0.24)

Genre: Research Report (0.40)

Industry:

Transportation (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback